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Tensoral: a system for post-processing 
turbulence simulation data 

By Eliot Dresselhaus 

1. Motivations and objectives 

1.1 General motivations 

“^Many computer simulations in engineering and science — and especially in com- 
putational fluid dynamics (CFD) - produce huge quantities of numerical data. 
These data are often so large (consider the roughly 1 Gbyte needed for a single 
scalar variable in 512 3 isotropic turbulence simulations) as to make even relatively 
simple post-processing of this data unwieldy. The data, once computed and quality- 
assured, is most likely analyzed by only a few people (usually only the simulation 
authors) and from at most a few perspectives (usually only those at which the au- 
thors are most concerned and knowledgeable). As a result, much useful numerical 
data is under-utilized. Since future state-of-the-art simulations will produce even 
larger datasets, will use more complex flow geometries, and will be performed on 
more complex super-computers (for example, super-computers with many loosely 
coupled processors), data management issues will become increasingly cumbersome. 

My goal is to provide software which will automate the present and future task of 
managing and post-processing large turbulence datasets. My research has focused 
on thl development of these software tools - specifically, through the development 
of a very high-level language called « Tensor aT' . The ultimate goal of Tensoral is to 
convert high-level mathematical expressions (tensor algebra, calculus, and statistics) 
into efficient low-level programs which numerically calculate these expressions given 
simulation datasets. For example, a user’s program to calculate vorticity would be 
coded in Tensoral as something akin to uJ = V x u. Tensoral would process this 
“program” — at least for the case of homogeneous turbulence on the Cray Y-Mt' 
— into a roughly 200-line Vectoral program to calculate vorticity. 

This approach to the database and post-processing problem has several advan- 
tages Using Tensoral the numerical and data management details of a simulation 
are shielded from the concerns of the end user. This shielding is carried out without 
sacrificing post-processor efficiency and robustness. Another advantage of Tensoral 
is that its very high-level nature lends itself to portability across a wide variety of 
computing (and super-computing) platforms. This is especially important consid- 
ering the rapidity of changes in supercomputing hardware. — : 

1.2 Specific motivations and objectives 

The fundamental scientific goal of fluids research is to reach an understanding of 
the correlation between the Navier-Stokes equations 

d t u + ( u ■ V)u = V pf p + i 'V 2 u 
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(whether incompressible or compressible) and the observed and theoretically pre- 
dicted features of the velocity field u(x, t), the pressure field p(x, t ), and other related 
quantities. u(x,t) is the fundamental hydrodynamic quantity: aJl other quantities, 
both dynamic and statistical, are derived from it (at least for incompressible flows).' 
Turbulence theory, modeling, and experiments are all phrased in terms of quantities 
derived from the velocity field u(x,t). The quantities arising in this theory, model- 
ing, and experiments are precisely the ones we desire to compute; they include: 

• the vorticity vector field, u(x, t) = V x u(x, t ), 

• the strain rate tensor, Sij = ( diUj + djU { )/2, 

• the pressure scalar, p(x, t ) = — V 2 V • (u • V)u (for incompressible flows), 

• the kinetic energy dissipation density e(z,f) = i/JV S tl S tJ (incompressible), 

•^the wave-space velocity field u(k,t) and associated energy spectrum E(k) = 

KM)I 2 , 

• the density scalar p(x,t ) (for compressible flows), 

• the stream function xp, V x xp = u, 

• the helicity density, u • Co. 

Statistical quantities of interest include: 

• mean velocity profiles and other averaged velocity components, 

• the Reynolds stress tensor R XJ = (u t uj), 

• the correlation tensor (ui(x)uj(x + x*)), 

• the total energy dissipation € a Ylij {Sij Sij), 

• the enstrophy (mean square vorticity) (w 2 ), 

• the pressure strain correlation (pS tJ ). 

We desire to compute quantities such as the above using data from several fam- 
ilies of turbulence simulations. These datasets solve either the incompressible or 
compressible Navier-Stokes equations for a variety of geometries and boundary con- 
ditions. Different geometries and boundary conditions imply that the velocity field 
is represented by different grids or with various different orthogonal function expan- 
sions (for spectral methods). Even though these datasets simulate roughly the same 
underlying equations, such dissimilarities in geometry and boundary conditions re- 
quire dissimilar numerical methods and data management schemes. Some simu- 
lations use orthogonal functions (e.g. Fourier, Chebyshev, Jacobi eigenfunctions) 
to satisfy the boundary conditions; derivatives are calculated spectrally. Other 
simulations use finite-difference methods to calculate derivatives and are likely set 
in complex geometries (relative to the spectral simulations). Certain simulations 
use curvilinear grids. Some simulations evolve the evolution equation for tT; others 
evolve its curl, w. Thus, some databases contain the velocity field u itself while 
others contain its curl. On a more mundane level, the simulations are performed 
on several different super-computers (Cray Y-MP, Intel Hypercube, Thinking Ma- 
chines CM-2) and retain some degree of machine specificity even at the database 
level (e.g. machine byte-order, floating point format, machine-specific optimized 
Fourier transform routines, etc.). 



Tensoral database post-processing 


457 


2.2 Current post-processing 

Currently all post-processing of turbulence data is done “by hand.” That is, for 
each simulation and for each desired quantity, someone must either add the required 
code to an existing post-processor or develop a specific new post-processor perhaps 
with an existing one as a model. If the databases in question were small and simple, 
either of these options would be straightforward. Since the databases axe very large 
and have numerical quirks to them, both options involve significant effort. 

A simple example will illustrate this. Suppose we desire to calculate the physical- 
space pressure p(x) given a wave-space space velocity field snapshot u ( k ) {ro ™ an 
isotropic turbulence database (the simplest to post-process in the above Uble). Here 

is an outline of what must be done to calculate p(x) = —V 

• Read in u(k) in k z -k y planes and calculate necessary y derivatives in wave space 

(multiplying by ik y ). . T , . 

• Fourier transform these derivatives from wave y space to physical y space. This 
is the first of three sub-transform steps that make up a full three-dimensional Fourier 

transform. , , . a * 
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derivatives (multiplying by ik x and ik x ), 

• Fourier transform both x and 2 axes into physical space. 


At this point, all of 


the required velocity derivatives are in physical space. 

• Form the source term in physical space. 

. Transform source term, now fully calculated, back into full wave space and 

invert V 2 (divide by — fc 2 ). 

• Transform result back to full physical space. 

Much of the complexity of this example stems from the fact that the complete 
velocity field is too large to fit into even a super-computer’s central memory thus, 
the data must be split into “pencils” or “planes” of one or two dimensional data. 
For more complex databases, even more steps must be taken to perform a similar 
computation: for example, for certain simulations the physical space product must 
be dealiased on a 3/2 size grid; for others the derivatives involve Chebyshev and 
Fourier transforms rather than just Fourier transforms as above. 

Considering the above example, one can see that a post-processor which computes 
many quantities can become significantly complex and inscrutable to the uniniti- 
ated. In fact, for certain simulations the post-processing software is a more complex 
code than the simulation code itself. This is particularly true when time and space 
optimization issues are important. It is important to realize, however, that these 
complexities can all be understood and are fairly algorithmic. In particular it is 
plausible that an expert system (such as the Tensoral language) can be taught to 
venerate code to perform the above and similar post-processing tasks. 

To summarize, the post-processing of turbulence data involves performing tensor 
calculus and statistics on a number of dissimilar numerical dataset types Numerical 
operations must be performed in a manner consistent with the simulation which 
generated the database. Currently post- processors are written entirely by hand 
and are specific not only to the simulation in question but also to one or more 
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particular quantities of interest. Moreover, these post -processors are quite complex 
codes in their own rights and provide significant barriers to the uninitiated who 
desire to distill scientific understanding from the myriad computational details. It 
is entirely plausible that this task — the creation of database post-processors — 
can be automated. 

2. Accomplishments: tensoral design and ^tensoral implementation. 

2.1 Tensoral by example 

The best way to introduce a new computer language is by example; suppose we 
desire to study the role of the pressure strain term ( p5,j } in the mean Reynolds 
stress (uiUj) evolution 

q 

~Qi (Rij) = 2 (pSij) + v (ujV 2 Uj + UiV 2 Uj) . 

This hypothetical study would calculate ( pSij ) for various turbulence databases. 
To calculate (pS, } ) given a database file db, one would code the following Tensoral 
program in a file ps .tl: 

Line 1 A.ij ■ db:u_i,j 

Line 2 S.ij = (A_ij + A_ji)/2 

Line 3 v.k = 1/2 epsilon_ijk A.ij 

Line 4 p = -unlaplacian(S_ij*S_ji - w_k*w_k) 

Line 5 print "Mean pressure strain", <p S_ij> 

Line 1 defines the velocity gradient tensor A.ij. Lines 2 and 3 form the strain S.ij 
tensor and vorticity w.k. Line 4 inverts the Poisson equation for the pressure p. Line 
5 averages the pressure strain and writes the result to the console. To complete our 

study, we would run this program for several database files from several different 
simulations. 


2.2 Tensoral design: from top to bottom 

Exactly how is the program ps , tl turned into an efficient post-processor to per- 
form the indented task? The overall answer to this question is illustrated in figure 1 
and is described in what follows. The Tensoral compiler takes the program supplied 
by the end database user (e.g. ps . tl), determines the appropriate numerical meth- 
ods and data management techniques for the database file db (found in a “database 
description”), and uses this information to output a post-processor in a lower-level 
language (relative to Tensoral). This low-level post-processor is automatically gen- 
erated and may in principal be any sufficiently powerful language such as VectoraJ, 
Fortran, or C; we use the Vector al language in our prototype because of several of 
its unique features. Finally, this low-level post- processor is compiled and combined 
with the requisite library routines (e.g. Fourier, Chebyshev, Jacobi transforms, 
Poisson solvers) to make an executable post-processor which can then be used to 
visualize data, to make graphs, or to transport post-processed data to other sites. 
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Figure 1 


Clearly, the most involved step of this automated system is the generation of a 
low-level Vectoral post-processor given a high-level Tensoral program. This s ep 
is performed by the Tensoral compiler, which compiles very high-level tensor op- 
erations into a low-level “machine code” (called M Tensoral ). ^Tensoral machine 
code consists of the primitive database operations such as reading a pencil or plane 
of data into main memory, Fourier transforming that pencil or plane etc. these 
primitive operations are defined in the database description. Seen this way, the 
Tensoral compiler is an expert system which figures out how to generate and order 
primitive u Tensoral database operations to accomplish a given task. If the database 
description is properly coded, the compiler should be able to generate code which 
is nearly as efficient as that generated by hand. 


2.3 Tensoral implementation: from the bottom up 

Implementation of the Tensoral proceeds from the bottom up. That is, my goal 
is to develop a fully functional and robust low-level /x Tensoral machine code be- 
fore launching into the Tensoral compiler development which will eventually gen- 
erate this low-level code. Using this approach, the design outlined above can be 
proved and tuned from its foundations up. In particular, database descriptions and 
uTensoral codes can be written and refined as experience is gained. 

Currently a functional /i Tensoral system has been implemented This system 
is built atop a small Lisp interpreter. The description of numerical methods and 
data management schemes provided to M Tensoral by database descriptions drive 
Lisp code which sets up an environment in which /x Tensoral code is then executed. 
Thus slight differences between simulations can be conditionahzed in Lisp (using l 
statements, for example) so that one database description file can actually handle 
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multiple related simulations. Also, using such conditionals one can write fiTensoral 
programs which are common between closely related simulations. The Lisp system 
is only seen by those programming in fiTensoral and writing database descriptions, 
and only then in the remote background. In particular, Lisp will never be seen by 
Tensor al programmers and almost never by fiTensoral programmers. 

A piTensoral program to calculate vorticity u>(x, tf) = V x tT(x, £), given spectral 
is coded as follows: 

(def ine-tensor w 1) 

; Read u in xy planes; calculate y derivatives; 

; transform to physical y. 

(loop-memxy 
(bs->memxy u.l u_3) 

(->memxy v_3 w_l) 

(* w_l (diff _2 u_3) ) 

(* w_3 (diff_2 u_l)) 

(vavey->physy- w_l w_3) 

(memxy->bs w_l w_3)) 

; Read u in xz planes; calculate other derivatives; 

; form vorticity. 

(loop-memxz 
(bs->memxz u) 

(bs->memxz w) 

(= w.l (- w.l (diff. 3 u_2) ) ) 

(■ w_2 (- (diff. 3 u_l) (diff.l u.3))) 

(= w.3 (- (diff.l u_2) w_3) ) 

(vavez->physz- w) 

(vavex->physx- w) 

(memxz->bs w)) 

This program is essentially a high-level u shorthand” for the two passes through 
the database needed to calculate vorticity. It is particularly noteworthy that the 
roughly 20 line /iTensoral shorthand shown here generates an approximately 180- 
line Vectoral program. This large code expansion indicates that even the embry- 
onic Tensoral system can be used as a useful post-processing tool, even though 
it assumes a programmer be familiar with the precise numerics of a simulation (as 
represented by its database description). 

Current work focuses on extending fiTensoral Currently ft Tensoral only “knows” 
about tensor algebra and calculus but not statistics. General statistical functions 
are now being added to average over combinations of x, y, and z coordinates to form 
correlations and probability density functions (PDFs). Thus far only one database 
description has been completed; others will follow. 


